Search CORE

167 research outputs found

Systematic exploration of autonomous modules in noisy microRNA-target networks for testing the generality of the ceRNA hypothesis

Author: Danny Kit-Sang Yip
Iris K Pang
Kevin Y Yip
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Machine Learning and Genome Annotation: A Match Meant to Be?

Author: Cheng Chao
Gerstein Mark
Yip Kevin Y
Publication venue: Dartmouth Digital Commons
Publication date: 29/05/2013
Field of study

By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors

Author: Cheng Chao
Gerstein Mark B
Shou Chong
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We propose a method to predict yeast transcription factor targets by integrating histone modification profiles with transcription factor binding motif information. It shows improved predictive power compared to a binding motif-only method. We find that transcription factors cluster into histone-sensitive and -insensitive classes. The target genes of histone-sensitive transcription factors have stronger histone modification signals than those of histone-insensitive ones. The two classes also differ in tendency to interact with histone modifiers, degree of connectivity in protein-protein interaction networks, position in the transcriptional regulation hierarchy, and in a number of additional features, indicating possible differences in their transcriptional regulation mechanisms

Crossref

Springer - Publisher Connector

PubMed Central

Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data

Author: Kevin Y. Yip
Koon-Kiu Yan
Magnus Rattray
Mark Gerstein
Roger P. Alexander
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

We performed computational reconstruction of the in silico gene regulatory networks in the DREAM3 Challenges. Our task was to learn the networks from two types of data, namely gene expression profiles in deletion strains (the ‘deletion data’) and time series trajectories of gene expression after some initial perturbation (the ‘perturbation data’). In the course of developing the prediction method, we observed that the two types of data contained different and complementary information about the underlying network. In particular, deletion data allow for the detection of direct regulatory activities with strong responses upon the deletion of the regulator while perturbation data provide richer information for the identification of weaker and more complex types of regulation. We applied different techniques to learn the regulation from the two types of data. For deletion data, we learned a noise model to distinguish real signals from random fluctuations using an iterative method. For perturbation data, we used differential equations to model the change of expression levels of a gene along the trajectories due to the regulation of other genes. We tried different models, and combined their predictions. The final predictions were obtained by merging the results from the two types of data. A comparison with the actual regulatory networks suggests that our approach is effective for networks with a range of different sizes. The success of the approach demonstrates the importance of integrating heterogeneous data in network reconstruction

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

LinkHub: a Semantic Web system that facilitates cross-database queries and information retrieval in proteomics

Author: Cheung Kei-Hoi
Gerstein Mark B
Schultz Martin
Smith Andrew K
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background A key abstraction in representing proteomics knowledge is the notion of unique identifiers for individual entities (e.g. proteins) and the massive graph of relationships among them. These relationships are sometimes simple (e.g. synonyms) but are often more complex (e.g. one-to-many relationships in protein family membership). Results We have built a software system called LinkHub using Semantic Web RDF that manages the graph of identifier relationships and allows exploration with a variety of interfaces. For efficiency, we also provide relational-database access and translation between the relational and RDF versions. LinkHub is practically useful in creating small, local hubs on common topics and then connecting these to major portals in a federated architecture; we have used LinkHub to establish such a relationship between UniProt and the North East Structural Genomics Consortium. LinkHub also facilitates queries and access to information and documents related to identifiers spread across multiple databases, acting as "connecting glue" between different identifier spaces. We demonstrate this with example queries discovering "interologs" of yeast protein interactions in the worm and exploring the relationship between gene essentiality and pseudogene content. We also show how "protein family based" retrieval of documents can be achieved. LinkHub is available at hub.gersteinlab.org and hub.nesg.org with supplement, database models and full-source code. Conclusion LinkHub leverages Semantic Web standards-based integrated data to provide novel information retrieval to identifier-related documents through relational graph queries, simplifies and manages connections to major hubs such as UniProt, and provides useful interactive and query interfaces for exploring the integrated data.</p

Crossref

Directory of Open Access Journals

PubMed Central

Recommended from our members

OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.

Author: Chan Ting-Fung
Chu Catherine
Ho Pak-Leung
Kwok Pui-Yan
Lai Yvonne Yuk-Yin
Leung Alden King-Yung
Li Le
Liu Melissa Chun-Jiao
Yip Kevin Y
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

BackgroundOptical mapping is an emerging technology that complements sequencing-based methods in genome analysis. It is widely used in improving genome assemblies and detecting structural variations by providing information over much longer (up to 1 Mb) reads. Current standards in optical mapping analysis involve assembling optical maps into contigs and aligning them to a reference, which is limited to pairwise comparison and becomes bias-prone when analyzing multiple samples.FindingsWe present a new method, OMMA, that extends optical mapping to the study of complex genomic features by simultaneously interrogating optical maps across many samples in a reference-independent manner. OMMA captures and characterizes complex genomic features, e.g., multiple haplotypes, copy number variations, and subtelomeric structures when applied to 154 human samples across the 26 populations sequenced in the 1000 Genomes Project. For small genomes such as pathogenic bacteria, OMMA accurately reconstructs the phylogenomic relationships and identifies functional elements across 21 Acinetobacter baumannii strains.ConclusionsWith the increasing data throughput of optical mapping system, the use of this technology in comparative genome analysis across many samples will become feasible. OMMA is a timely solution that can address such computational need. The OMMA software is available at https://github.com/TF-Chan-Lab/OMTools

eScholarship - University of California

A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets

Author: Alexander Roger
Cheng Chao
Gerstein Mark
Rozowsky Joel
Shou Chong
Yan Koon-Kiu
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels

Crossref

Springer - Publisher Connector

PubMed Central

A web services choreography scenario for interoperating bioinformatics applications

Author: Cheung David W
Cheung Kei-Hoi
de Knikker Remko
Guo Youjun
Kwan Albert KH
Li Jin-long
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. RESULTS: To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD) that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH) category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replicates the HAPI functionality. These web services use document-style messages, which means that messages are encoded in an XML-based format. We compared three approaches to the implementation of an XML-based workflow: a hard coded Java application, Collaxa BPEL Server and Taverna Workbench. The Java program functions as a web services engine and interoperates with these web services using a web services choreography language (BPEL4WS). CONCLUSION: While it is relatively straightforward to implement and publish web services, the use of web services choreography engines is still in its infancy. However, industry-wide support and push for web services standards is quickly increasing the chance of success in using web services to unify heterogeneous bioinformatics applications. Due to the immaturity of currently available web services engines, it is still most practical to implement a simple, ad-hoc XML-based workflow by hard coding the workflow as a Java application. For advanced web service users the Collaxa BPEL engine facilitates a configuration and management environment that can fully handle XML-based workflow

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

Identification of specificity determining residues in peptide recognition domains using an information theoretic approach applied to large-scale binding maps

Author: Gerstein Mark
Hu Xihao
Kim Philip M
Sidhu Sachdev S
Sitwell Simon
Turk Benjamin E
Utz Lukas
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Peptide Recognition Domains (PRDs) are commonly found in signaling proteins. They mediate protein-protein interactions by recognizing and binding short motifs in their ligands. Although a great deal is known about PRDs and their interactions, prediction of PRD specificities remains largely an unsolved problem. Results We present a novel approach to identifying these Specificity Determining Residues (SDRs). Our algorithm generalizes earlier information theoretic approaches to coevolution analysis, to become applicable to this problem. It leverages the growing wealth of binding data between PRDs and large numbers of random peptides, and searches for PRD residues that exhibit strong evolutionary covariation with some positions of the statistical profiles of bound peptides. The calculations involve only information from sequences, and thus can be applied to PRDs without crystal structures. We applied the approach to PDZ, SH3 and kinase domains, and evaluated the results using both residue proximity in co-crystal structures and verified binding specificity maps from mutagenesis studies. Discussion Our predictions were found to be strongly correlated with the physical proximity of residues, demonstrating the ability of our approach to detect physical interactions of the binding partners. Some high-scoring pairs were further confirmed to affect binding specificity using previous experimental results. Combining the covariation results also allowed us to predict binding profiles with higher reliability than two other methods that do not explicitly take residue covariation into account. Conclusions The general applicability of our approach to the three different domain families demonstrated in this paper suggests its potential in predicting binding targets and assisting the exploration of binding mechanisms.</p

University of Toronto Research Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central